Memory-based active learning for French broadcast news
نویسندگان
چکیده
Stochastic dependency parsers can achieve very good results when they are trained on large corpora that have been manually annotated. Active learning is a procedure that aims at reducing this annotation cost by selecting as few sentences as possible that will produce the best possible parser. We propose a new selective sampling function for Active Learning that exploits two memory-based distances to find a good compromise between parser uncertainty and sentence representativeness. The reduced dependency between both parsing and selection models opens interesting perspectives for future models combination. The approach is validated on a French broadcast news corpus creation task dedicated to dependency parsing. It outperforms the baseline uncertainty entropy-based selective sampling on this task. We plan to extend this work with selfand co-training methods in order to enlarge this corpus and produce the first French broadcast news Tree Bank.
منابع مشابه
The need to create a media block for the convergence of overseas news networks
As a general diplomacy arm of the Islamic Republic of Iran, VoSiMa has extensive activities in international broadcasting of its radio and television programs. These programs are broadcast in different languages, such as English, French, Azeri, Arabic, and ... for regional and transnational audiences. The large volume of the organization's international activities is in the form of news and new...
متن کاملAccounting for Prosodic Information to Improve ASR-Based Topic Tracking for TV Broadcast News
The increasing quantity of video material available on line requires improved methods to help users navigate such data, among which are topic tracking techniques. The goal of this paper is to show that prosodic information can improve an ASRbased topic tracking system for French TV Broadcast News. To this end, two kinds of prosodic information — extracted with and without a learning phase — are...
متن کاملFrench Broadcast News Transcription
We describe a French broadcast news transcription system created in the scope of the CIMWOS project [1]. We collected a corpus based on two French and one Belgian TV stations. This corpus forms the base of various system components, such as ASR and Speaker ID. We discuss a few problems posed to speech recognition by characteristics of the French language and approaches to solve them. Finally we...
متن کاملStudy of Numerical Processing Speed, Implicit and Explicit Memory, Active and Passive Memory, Conservation Abilities, and Visual-Spatial Skills of Students with Dyscalculia
Background and Purpose: Learning disorder is one of the common disorders in students, which can lead to the occurrence of educational problems and secondary disorders in them. Based on psychopathological criteria, dyscalculia is one of the subcategories of learning disorder. Children with this disorder have problems in perception of spatial relations and in different cognitive abilities. Theref...
متن کاملVOXALEAD: A Scalable Video Search Engine Based On Content
Most news organizations provide immediate access to topical news broadcasts through RSS streams or podcasts. Until recently, applications have not permitted a user to perform content based search within a longer spoken broadcast to find the segment that might interest them. Recent progress in both automatic speech recognition (ASR) and natural language processing (NLP) has produced robust tools...
متن کامل